Do you remember the animated plots we produced in the introductory lecture for this course based on the Gapminder Hans Rosling animated visualization?
In this worked example, we’ll work out how to reproduce that plot as both an animated an interactive visualization.
The dataset that we’ll use is available via the gapminder package. So go ahead and install that package.
This makes the dataset available in an object called
gapminder. These are the first few rows of the dataset.
## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
The variables should be self-explanatory.
Let’s jump right in and create a bubble plot faceted on year (which we cut into groups), with population mapped to the size of the bubbles, and GDP per capita and life expectancy on the x and y axes respectively.
library(tidyverse)
gapminder %>%
mutate(years = cut_interval(year, length = 5)) %>%
ggplot(aes(gdpPercap, lifeExp, size = pop, color = continent)) +
geom_point(alpha = 0.5) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap("years") +
labs(
y = "Life Expectancy",
x = "GDP per Capita",
size = "population"
)Life Expectancy and GDP per capita from 1950 to 2010.
Just as we said in the first lecture, this visualization is not (yet) working out so well for us. Let’s make it animated instead. For this, we’ll use the gganimate package. First install the package.
To use the gganimate package you also need a renderer to produce animated images. You can use either gifski or ImageMagick. We recommend the former (and gganimate defaults to gifski if it is installed), but either will work just fine. Run one (or both) of the following lines to install a renderer.
We build the plot as before, but now make it animated by adding the
transition_time() function to the plot as well as use the
title label to show the current year.
library(gganimate)
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent)) +
geom_point(alpha = 0.5) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
labs(
title = "Year: {frame_time}", # special glue syntax,
y = "Life Expectancy",
x = "GDP per Capita",
size = "population"
) +
transition_time(year)GDP per capita and life expectancy for some of the countries of the world.
If you think the plot is still crowded, we could alternatively use
facets to separate continents. Here we also make use of the
country_colors object that is included in the
gapminder package.
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
geom_point(alpha = 0.5) +
scale_colour_manual(values = country_colors, guide = FALSE) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent) +
labs(
title = "Year: {frame_time}",
x = "GDP per capita",
y = "Life expectancy"
) +
transition_time(year)GDP per capita and life expectancy; now with facets!
So far our plot does a good job of showing the trends among the various continents of the world but is hard to use if we are interested in one specific country. A remedy for this can be to use labels to let us identify which bubble belongs to which country. The large number of countries, however, means that it’s not a frightfully good idea to label all of them.
Instead, we’ll pick out the largest two countries (at the latest time
stamp) on each continent and label those. First, we store the names of
the countries in a vector, large_country_names.
The following steps first filter the dataset so that only
observations from the latest year (max(year)) are kept,
then groups the dataset by continent, then slices the dataset so that
the observations (countries) with the largest and next-to-largest values
of population (pop) of each group (continent) are kept, and
then finally pulls out (using pull()) the country
names.
large_country_names <-
gapminder %>%
filter(year == max(year)) %>%
group_by(continent) %>%
slice_max(pop, n = 2) %>%
pull(country)
large_country_names## [1] Nigeria Egypt United States Brazil China
## [6] India Germany Turkey Australia New Zealand
## 142 Levels: Afghanistan Albania Algeria Angola Argentina Australia ... Zimbabwe
Then we filter the original dataset to create a separate dataset for our labels.
Now we put everything together; this time we also change the
easing of aesthetics from linear to cubic in-and-out using
ease_aes(), to more clearly show that we actually only have
data on a 5-year interval here. We label the countries with
geom_label_repel() from the ggrepel
package, in order to avoid overlapping labels.1
library(ggrepel)
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
geom_point(alpha = 0.5) +
geom_label_repel(
aes(gdpPercap, lifeExp, label = country),
inherit.aes = FALSE,
seed = 1, # important when animating
nudge_x = 5,
nudge_y = -10,
data = large_countries
) +
scale_colour_manual(values = country_colors, guide = FALSE) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent) +
labs(
title = "Year: {frame_time}",
x = "GDP per capita",
y = "Life expectancy"
) +
transition_time(year) +
ease_aes("cubic-in-out")